## **ASSIGNMENT 6**

Q1 A Lu Simulations,

| Configurations          | Core 0             | Core 1      | Core 2      | Core 3      |
|-------------------------|--------------------|-------------|-------------|-------------|
| L1_instruction          | Time (ns) =        | Time (ns) = | Time (ns) = | Time (ns) = |
| cache: size = 4,        | 92806161           | 92804812    | 92804860    | 92804812    |
| associativity = 1       |                    |             |             |             |
| L1_data cache: size     |                    |             |             |             |
| = 4, associativity = 1  |                    |             |             |             |
| L2 cache: size = 32,    |                    |             |             |             |
| associativity = 4       |                    |             |             |             |
| (base configuration)    |                    |             |             |             |
| L1_instruction          | Time (ns) =        | Time (ns) = | Time (ns) = | Time (ns) = |
| cache: size = 4,        | 92761838           | 92760468    | 92760468    | 92760468    |
| associativity = 2       |                    |             |             |             |
| L1_data cache: size     |                    |             |             |             |
| = 8, associativity = 4  |                    |             |             |             |
| L2 cache: size =        |                    |             |             |             |
| 2048, associativity =   |                    |             |             |             |
| 4                       |                    |             |             |             |
| (random                 |                    |             |             |             |
| configuration)          |                    |             |             |             |
| L1_instruction          | Time (ns) =        | Time (ns) = | Time (ns) = | Time (ns) = |
| cache: size = 32,       | 92798890           | 92797536    | 92797509    | 92797536    |
| associativity = 4       |                    |             |             |             |
| L1_data cache: size     |                    |             |             |             |
| = 32, associativity =   |                    |             |             |             |
| 4                       |                    |             |             |             |
| L2 cache: size =        |                    |             |             |             |
| 2048, associativity =   |                    |             |             |             |
| 8                       |                    |             |             |             |
| (maximum                |                    |             |             |             |
| configuration)          | <del>-</del> : / ) | ( )         | / \         | ( )         |
| L1_instruction          | Time (ns) =        | Time (ns) = | Time (ns) = | Time (ns) = |
| cache: size = 16,       | 89236693           | 89235396    | 89235442    | 89235396    |
| associativity = 2       |                    |             |             |             |
| L1_data cache: size     |                    |             |             |             |
| = 32, associativity = 2 |                    |             |             |             |
| L2 cache: size =        |                    |             |             |             |
| 2048, associativity =   |                    |             |             |             |
| 2048, associativity = 4 |                    |             |             |             |
| (best configuration)    |                    |             |             |             |
| (Dest Collingulation)   |                    |             |             |             |

Q1 B Cholesky Simulations,

| Configurations          | Core 0      | Core 1      | Core 2      | Core 3      |
|-------------------------|-------------|-------------|-------------|-------------|
| L1 instruction          | Time (ns) = | Time (ns) = | Time (ns) = | Time (ns) = |
| cache: size = 4,        | 190842565   | 190841054   | 190841054   | 190841054   |
| associativity = 1       |             |             |             | 130041034   |
| L1 data cache: size     |             |             |             |             |
| = 4, associativity = 1  |             |             |             |             |
| L2 cache: size = 32,    |             |             |             |             |
| associativity = 4       |             |             |             |             |
| (base configuration)    |             |             |             |             |
| L1_instruction          | Time (ns) = | Time (ns) = | Time (ns) = | Time (ns) = |
| cache: size = 8,        | 187100554   | 187098794   | 187098665   | 187098665   |
| associativity = 2       |             |             |             |             |
| L1_data cache: size     |             |             |             |             |
| = 8, associativity = 2  |             |             |             |             |
| L2 cache: size = 512,   |             |             |             |             |
| associativity = 8       |             |             |             |             |
| (random                 |             |             |             |             |
| configuration)          |             |             |             |             |
| L1_instruction          | Time (ns) = | Time (ns) = | Time (ns) = | Time (ns) = |
| cache: size = 32,       | 190707072   | 190705544   | 190705598   | 190705544   |
| associativity = 4       |             |             |             |             |
| L1_data cache: size     |             |             |             |             |
| = 32, associativity =   |             |             |             |             |
| 4                       |             |             |             |             |
| L2 cache: size =        |             |             |             |             |
| 2048, associativity =   |             |             |             |             |
| 8                       |             |             |             |             |
| (maximum                |             |             |             |             |
| configuration)          | — · / )     | / \         | ( )         |             |
| L1_instruction          | Time (ns) = | Time (ns) = | Time (ns) = | Time (ns) = |
| cache: size = 16,       | 182974654   | 182973206   | 182973206   | 182973206   |
| associativity = 1       |             |             |             |             |
| L1_data cache: size     |             |             |             |             |
| = 32, associativity = 4 |             |             |             |             |
| ·                       |             |             |             |             |
| L2 cache: size = 512,   |             |             |             |             |
| associativity = 4       |             |             |             |             |
| (best configuration)    |             |             |             |             |

Rohan Jhaveri CSU Id: 830962238

## Q2 A

Cholesky Simulation for following configuration,

L1\_instruction cache: size = 32, associativity = 4

L1\_data cache: size = 32, associativity = 4

L2 cache: size = 256, associativity = 8

| Type of   | Power | Energy | IPC                              | Execution Time (ns)                           |
|-----------|-------|--------|----------------------------------|-----------------------------------------------|
| Execution | (W)   | (J)    | Core 0   Core1   Core 2   Core 3 | Core 0   Core1   Core 2   Core 3              |
| In-Order  | 15.75 | 3.0    | 0.46   0.45   0.44   0.46        | 190784478   190782902   190782902   190782902 |
| Execution |       |        |                                  |                                               |
| Out-of-   | 30.65 | 4.04   | 2.60   2.58   2.51   2.55        | 131811218   131808258   131808258   131808258 |
| Order     |       |        |                                  |                                               |
| Execution |       |        |                                  |                                               |

Lu Simulation for following configuration,

L1\_instruction cache: size = 32, associativity = 4

L1\_data cache: size = 32, associativity = 4

L2 cache: size = 256, associativity = 8

| Type of   | Power | Energy | IPC                              | Execution Time (ns)                       |
|-----------|-------|--------|----------------------------------|-------------------------------------------|
| Execution | (W)   | (J)    | Core 0   Core1   Core 2   Core 3 | Core 0   Core1   Core 2   Core 3          |
| In-Order  | 15.11 | 1.40   | 0.49   0.47   0.45   0.47        | 92785989   92784604   92784750   92784604 |
| Execution |       |        |                                  |                                           |
| Out-of-   | 28.60 | 2.20   | 2.29   2.19   2.09   2.18        | 79191672   79189268   79190424   79189268 |
| Order     |       |        |                                  |                                           |
| Execution |       |        |                                  |                                           |

In both Cholesky & Lu simulations, the Out-of-Order execution has higher power and energy consumption, higher IPC and lower execution time compared In-Order execution.

Power and Energy Consumption:

Out-of-Order execution has additional hardware used for re-ordering instructions which results in the increase in the power and energy consumption.

IPC:

IPC increases because now there will be more number of instructions executed as at a given point of time where dependency is encountered, in which the out-of-order processor will execute another instruction instead of sitting idle and waiting for the dependency to be executed.

Rohan Jhaveri CSU Id: 830962238

## **Execution Time:**

In In-Order execution the instructions are executed in sequence, so when a dependency comes through, the processor will halt and wait for the dependent instruction is execution not proceed further. Whereas in Out-of-Order execution, the processor will execute all independent instructions in the sequence while it waits for a dependency to come through in the current instruction. This results in reduction of execution time for Out-of-Order execution.

## Conclusion:

Hence, we can conclude that the out of order execution is good but the trade-off here is that the power and energy consumption increases by a larger margin.

Rohan Jhaveri CSU Id: 830962238

Q2 B

IPC depends on cache configurations like cache size, associativity issue width etc. It also inversely depends on frequency. If we decrease the frequency that means we increase the numbers of instructions per cycle which is IPC. Hence, we need to decrease the frequency of Cholesky and Lu to optimize their IPC.

Best Configuration for Cholesky Simulations,

L1\_instruction cache: size = 16, associativity = 1

L1\_data cache: size = 32, associativity = 4

L2 cache: size = 512, associativity = 4

IPC at frequency 0.5: Core 0 | Core 1 | Core 2 | Core 3

2.56 | 2.58 | 2.51 | 2.60

IPC at frequency 0.2: Core 0 | Core 1 | Core 2 | Core 3

2.63 | 2.61 | 2.54 | 2.58

Decreasing frequency from 0.5 to 0.2 results in increase in IPC. This is because when we decrease the frequency, we are essentially increasing the number of instruction per unit time as discussed above.

Best Configuration for Lu Simulations,

L1\_instruction cache: size = 16, associativity = 2

L1\_data cache: size = 32, associativity = 2

L2 cache: size = 2048, associativity = 4

IPC at frequency 0.5: Core 0 | Core 1 | Core 2 | Core 3

2.28 | 2.29 | 2.23 | 2.30

IPC at frequency 0.2: Core 0 | Core 1 | Core 2 | Core 3

2.30 | 2.32 | 2.28 | 2.35

The same thing is applicable for Lu as well, but just the margin of increment is less as compared to Cholesky.